A New Distributed Caching Technique for Accelerating the Web Query Processing
نویسندگان
چکیده
Because of the fast growing volume of web documents during the past decades, the efficiency of the web search engine has become more crucial than ever. Such efficiency can be estimated with both factors of the query relevance of search results answered and the financial cost for query processing. Between them, the ways for improving query relevance of web searches have been intensively studied in the research topics like hyperlink-based ranking, topic-sensitive document classifications, and semantic-awareness in rank evaluations. However, there have been not studies that provide an efficient solution to cut the financial cost of query processing, while retaining high query relevance. In this light, we propose a distributed cache scheme and a server-clustering technique that can be used to reduce the query processing cost. With the help of such techniques for accelerating the web query processing, we saved around 70% of the server cost of a commercial web search engine implemented in South Korea. We believe that our experiences can give a valuable insight to anyone who wants to develop a large-scale search engine.
منابع مشابه
Accelerating Database Processing at Database-Driven Web Sites
Most commercial Web sites dynamically generate their contents through a three-tier server architecture composed of a Web server, an application server, and a database server. In such an architecture, the database server easily becomes a bottleneck to the overall performance. In this paper, we propose WDBAccel, a high-performance database server accelerator that significantly improves the throug...
متن کاملQuery-Driven Indexing in Large-Scale Distributed Systems
Efficient and effective search in large-scale data repositories requires complex indexing solutions deployed on a large number of servers. Web search engines such as Google and Yahoo! already rely upon complex systems to be able to return relevant query results and keep processing times within the comfortable sub-second limit. Nevertheless, the exponential growth of the amount of content on the...
متن کاملSignature File Methods for Semantic Query Caching
In digital libraries accessing distributed Web-based biblio-graphic repositories, performance is a major issue. EEcient query processing requires an appropriate caching mechanism. Unfortunately, standard page-based as well as tuple-based caching mechanisms designed for conventional databases are not eecient on the Web, where keyword-based querying is often the only way to retrieve data. Therefo...
متن کاملOptimizing Web Queries through Semantic Caching
In Web-based searching systems that access distributed information providers, eecient query processing requires an advanced caching mechanism to reduce the query response time. The keyword-based querying is often the only way to retrieve data from Web providers, and therefore standard page-based and tuple-based caching mechanisms turn out to be ineecient for such a task. In this work, we develo...
متن کاملHash-Based Query Caching Method for Distributed Web Caching in Wide Area Networks
Distributed Web caching allows multiple clients to quickly access a pool of popular Web pages. Conventional distributed Web caching schemes, e.g., the Internet cache protocol and hash routing, require the sending of many query messages among cache servers and/or impose a large load on the cache servers when they are widely dispersed. To overcome these problems, we propose a hash-based query cac...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013